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Roseibacterium elongatum Suzuki et al. 2006 is a pink-pigmented and bacteriochlorophyll a- 
producing representative of the Roseobacter group within the alphaproteobacterial family 
Rhodobacteraceae. Representatives of the marine 'Roseobacter group' were found to be 
abundant in the ocean and play an important role in global and biogeochemical processes. In 
the present study we describe the features of R. elongatum strain OCh 323^ together with its 
genome sequence and annotation. The 3,555,102 bp long genome consists of one circular 
chromosome with no extrachromosomal elements and is one of the smallest known 
Roseobacter genomes. It contains 3,540 protein-coding genes and 59 RNA genes. Genome 
analysis revealed the presence of a photosynthetic gene cluster, which putatively enables a 
photoheterotrophic lifestyle. Gene sequences associated with quorum sensing, motility, sur- 
face attachment, and thiosulfate and carbon monoxide oxidation could be detected. The ge- 
nome was sequenced as part of the activities of the Transregional Collaborative Research 
Centre 51 (TRR51) funded by the German Research Foundation (DFG). 



Introduction 

Strain OCh 323t [= DSM 19469T = CIP 107377T = 
JCM 11220T} is the type strain of Roseibacterium 
elongatum in the bispecific genus Roseibacterium 
[1] with R. beibuensis [2] being the second species 
in the genus. The genus Roseibacterium belongs to 
the marine Roseobacter group, which was shown 
to be ubiquitious in the oceans of the world, espe- 
cially in coastal and polar oceans [3,4]. The strain 
was isolated from sand located at Monkey Mia, 
Shark Bay, at the west coast of Australia [1]. The 
genus Roseibacterium was named after the Latin 
adjective roseus ['rose, pink') and the Greek adjec- 
tive bakterion ('rod'); Roseibacterium ('pink, rod- 
shaped bacterium'). The species epithet 
elongatum refers to the Latin adjective elongatum 
('elongated, stretched out') [1]. Current PubMed 
records do not indicate any follow-up research 




with strain OCh 323t after the initial description 
of/?, elongatum [1]. 

In this study we analyzed the genome sequence of 
R. elongatum DSM 19469T. We present a descrip- 
tion of the genome sequencing and annotation and 
a summary classification together with a set of 
features for strain DSM 19469^, including novel 
aspects of its phenotype and features of the organ- 
ism. 

Classification and features 
1 6S rRNA gene analysis 

Figure 1 shows the phylogenetic neighborhood of 
R. elongatum DSM 19469^ in a 16S rRNA gene 
based tree. The sequences of the two identical 16S 
rRNA gene copies in the genome do not differ 
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from the previously published 16S rRNA gene se- 
quence (AB601471). 

A representative genomic 16S rRNA gene se- 
quence of R. elongatum DSM 19469^ was com- 
pared with the Greengenes database [22] for de- 
termining the weighted relative frequencies of 
taxa and (truncated [23]) keywords as previously 
described [5]. The most frequently occurring gen- 
era were Rhodovulum (35.1%), Jannaschia 
[13.5%), Dinoroseobacter [10.6%), Rhodobacter 
[9.6%) and Roseobacter [8.5%) [89 hits in total). 
Regarding the two hits to sequences from mem- 
bers of the species, the average identity within 
HSPs was 100.0%, whereas the average coverage 
by HSPs was 99.7%. Among all other species, the 
one yielding the highest score was 
Dinoroseobacter shibae [NC_009952), which cor- 
responded to an identity of 95.7% and a HSP cov- 
erage of 100.1%. [Note that the Greengenes data- 



base uses the INSDC [= EMBL/NCBI/DDBJ) anno- 
tation, which is not an authoritative source for 
nomenclature or classification). The highest- 
scoring environmental sequence was AF513932 
[Greengenes short name 'Rhodobacter group clone 
LA4-B3'), which showed an identity of 99.4% and 
a HSP coverage of 99.9%. The most frequently oc- 
curring keywords within the labels of all environ- 
mental samples that yielded hits were 'microbi' 
[4.3%), 'mat' [2.3%), 'sea' [2.0%), 'marin' [2.0%) 
and 'coral' [1.9%) [157 hits in total). The most 
frequently occurring keywords within the labels 
of those environmental samples that yielded hits 
of a higher score than the highest scoring species 
were 'group, rhodobact' [33.8%) and 
'rhodobacteracea' [32.4%) [2 hits in total). These 
keywords fit well to the known ecology [and phy- 
logenetic relationships) of R. elongatum DSM 
19469T. 



J 36 species 



■ Wenxinia marina (DQ640643) 



- Oceanicola granulosus (AY424896) * 

- Roseisalinus antarcticus (AJ605747) 

- Ketogulonicigenium vulgare (AF136849) 



I Marivita cryptomonadis (EU512919) 
'— Gaetbulicola byunsanensis (FJ467624) 

Pontibaca methylaminivorans (AJ505788) 

Donghicola eburneus (DQ667965) 



- Octadecabacter arcticus (U73725) 
Nereida ignava (AJ748748) * 



I [76/75 

-^77/7S~ 



- Tateyamaha omphalii (AB1 93438) 

Maribius salinus (AY906863) 

' — Hwanghaeicola aestuarii (FJ230842) 

- Profundibactehum mesophilum (JF776971) * 

- Pacificibacter maritimus (AB558927) 
Litoreibacter albidus (AB518881) 




■ Planktotalea frisia (FJ882052) 

Jannaschia heigolandensis (AJ438157) 

— Thalassobacter stenotrophicus (A J631302) 
Celeribacter neptunius (FJ535354) 
Huaishuia halophila (FJ436725) 



- Vadicella arenosi (AB564595) 
Pseudoruegeria aquimaris (DQ675021) 



\ 31 species 



Dinoroseobacter shibae (AJ534211) ** 

- Roseibactehum elongatum (IMG2522989585) * 

Tranquillimonas alkanivorans (AB302386) 

Tropicimonas isoalkanivorans (AB302379) 

Lutimaribacter saemankumensis (EU336981) 

Mahtimibacter alkaliphilus (DQ915443) * 

- Roseicyclus mahoneyensis (AJ315682) 



— Actibacterium mucosum (HE590855) 

- Hasllibacter halocynthiae (FJ638616) 



Figure 1. Phylogenetic tree highlighting the position of R. elongatum relative to the type strains of the type species of 
the other genera within the family Rhodobacteraceae. The tree was inferred from 1,331 aligned characters of the 16S 
rRNA gene sequence under the maximum likelihood (ML) criterion as previously described [5]. Rooting was done ini- 
tially using the midpoint method [6] and then checked for its agreement with the current classification (Table 1). The 
branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are 
support values from 600 ML bootstrap replicates (left) and from 1,000 maximum-parsimony bootstrap replicates (right) 
if larger than 60% [5]. Lineages with type strain genome sequencing projects registered in GOLD [7] are labeled with 
one asterisk, those also listed as 'Complete and Published' with two asterisks [8-11]. 
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Morphology and physiology 

Cells of strain OCh 323^ are Gram-negative, non- 
motile and rod-shaped, 1.6-10.0 ^m in length and 
0.5-0.8 |im in width [Figure 2). Colonies are circu- 
lar, smooth, convex and glistening, opaque and 
pink-pigmented. Optimum growth occurs at a 
temperature of 27-30°C and a pH of 7.5-8.0. Cells 
can grow in the presence of 0.5-7.5% NaCl but do 
not grow in the absence of NaCl. Cells are positive 
for urease activity but do not show nitrate 
reductase or phosphate activities. They are nega- 
tive in the Voges-Prosgauer test but the ONPG re- 
action is positive. Cells do not produce indole or 
H2S. Gelatin is hydrolyzed, but alginate, starch and 
Tween80 are not. Cells do not utilize acetate, cit- 
rate, D-glucose, DL-malate, ethanol, pyruvate, suc- 
cinate. Acid is not produced from D-fructose, D- 
glucose or lactose (all data from [1]). 

In this study the utilization of carbon compounds 
by R. elongatum DSM 19469T grown at 28°C was 
also determined using Generation-Ill microplates 
in an OmniLog phenotyping device [BIOLOG Inc., 



Hayward, CA, USA). The microplates were inocu- 
lated with a cell suspension at a cell density of 95- 
96% turbidity and dye IF-A. Further additives 
were vitamin, micronutrient and sea-salt solu- 
tions, which had to be added for dealing with such 
marine bacteria [24]. The plates were sealed with 
parafilm to avoid a loss of fluid. The measurement 
data were exported and further analyzed with the 
opm package for R [7,25], using its functionality 
for statistically estimating parameters from the 
respiration curves such as the maximum height, 
and automatically translating these values into 
negative, ambiguous, and positive reactions. 

The following substrates were utilized in the Gen- 
eration-Ill plates: positive control, pH 6, 1% NaCl, 
4% NaCl, D-galactose, D-fucose, L-fucose, L- 
rhamnose, 1% sodium lactate, D-arabitol, myo- 
inositol, rifamycin SV, L-aspartic acid, L-glutamic 
acid, L-histidine, L-serine, D-glucuronic acid, 
quinic acid, L-lactic acid, citric acid, a-keto-glutaric 
acid, D-malic acid, L-malic acid, nalidixic acid and 
sodium formate. 



lOjjm 




According to Generation-Ill plates the strain is 
negative for dextrin, D-maltose, D-trehalose, D- 
cellobiose, ^-gentiobiose, sucrose, D-turanose, 
stachyose, pH 5, D-raffinose, a-D-lactose, D-melibi- 
ose, ^-methyl-D-galactoside, D-salicin, iV-acetyl-D- 
glucosamine, iV-acetyl-^-D-mannosamine, iV- 



acetyl-D-galactosamine, iV-acetyl-neuraminic acid, 
8% NaCl, D-glucose, D-mannose, D-fructose, 3-0- 
methyl-D-glucose, inosine, fusidic acid, D-serine, 
D-sorbitol, D-mannitol, glycerol, D-glucose-6- 
phosphate, D-fructose-6-phosphate, D-aspartic ac- 
id, D-serine, troleandomycin, minocycline, gelatin. 
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glycyl-L-proline, L-alanine, L-arginine, L- 
pyroglutamic acid, lincomycin, guanidine hydro- 
chloride, niaproof, pectin, D-galacturonic acid, L- 
galactonic acid-gamma-lactone, D-gluconic acid, 
glucuronamide, mucic acid, D-saccharic acid, 
vancomycin, tetrazoHum violet, tetrazolium blue, 
p-hydroxy-phenylacetic acid, methyl pyruvate, D- 
lactic acid methyl ester, bromo-succinic acid, lithi- 
um chloride, potassium tellurite, tween 40, y- 
amino-n-butyric acid, cr-hydroxy-butyric acid, 
hydroxy-butyric acid, a-keto-butyric acid, 
acetoacetic acid, propionic acid, acetic acid, 
aztreonam, butyric acid and sodium bromate and 
the negative control. 

In a previous study by Suzuki et al. [1], bacterial 
growth on nine substrates was tested for R. 
elongatum OCh 323^. According to [1], none of the 



carbon sources were utilized. In contrast, the 
OmniLog assay resulted in more than fifteen posi- 
tive reactions, including sugars, carboxylic and 
amino acids. This observation can be explained by 
a higher sensitivity of respiration measurements 
compared to growth measurements [26]. For in- 
stance, the positive reactions detected only in the 
OmniLog instrument but not by Suzuki et al. [1] 
might be caused by substrates that were only par- 
tially metabolized. 

Chemotaxonomy 

The principal cellular fatty acids of strain OCh 

323T are Cl8:l (68%), Cl6:0 (12%), Cl8:0 (8%), Cl9:0 
cyclo (4%), Cl6:0 2-OH (2%), Cl4:0 3-OH (2%), Cl5:0 (1%), 

Ci7:o (1%), and Ci6:i (1%), whereas Ci4:o and Ci8:2 
are only found in traces (all data from [1]). 



Table 1. Classification and general features of R. elongatum OCh 323^ in accordance with the MlCS recom- 
mendations [12] published by the Genome Standards Consortium [13]. 



MIGS ID 


Property 


Term 


Evidence code 




Current classification 


Domain Bacteria 

Phylum Proteobacteria 

Class Alphaproteobacteria 

Order Rhodobacterales 

Family Rhodobacteraceae 

Genus Roseibacterium 

Species Roseibacterium elongatum 


TAS [14] 

TAS [15] 
TAS [16,17] 
TAS [17,18] 
TAS [17,19] 
TAS [1] 


TAS [1] 




Strain OCh 323^ 


TAS [1] 




Gram stain 


negative 


TAS [1] 




Cell shape 


rod-shaped 


TAS [1] 




Motility 


non-motile 


TAS [1] 




Sporulation 


non-sporulating 


NAS 




Temperature range 


mesophile 


NAS 




Optimum temperature 


27-30°C 


TAS [1] 




Salinity 


0.5-7.5% NaCi 


TAS [1] 


MIGS-22 


Oxygen requirement 


aerobic 


TAS [1] 




Carbon source 


complex 


NAS 




Energy metabolism 


chemoheterotroph/photoheterotroph 


TAS [1] 


MIGS-6 


Habitat 


sand, seawater 


TAS [1] 


MIGS-15 


Biotic relationship 


free- living 


TAS [1] 


MIGS-14 


Pathogenicity 


none 


NAS 




Biosafety level 


1 


TAS [20] 


MIGS-23.1 


Isolation 


sand 


TAS [1] 


MIGS-4 


Geographic location 


Monkey Mia, Shark Bay, Australian west coast 


TAS [1] 


MIGS-5 


Sample collection time 


not reported 




MIGS-4.1 


Latitude - Longitude 


-25.789- 113.721 


NAS 


MIGS-4.3 


Depth 


not reported 




MIGS-4.4 


Altitude 


not reported 





Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non- 
traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a general- 
ly accepted property for the species, or anecdotal evidence). Evidence codes are from of the Gene Ontology 
project [21]. 
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Genome sequencing and annotation 

Genome project history 

The genome of strain DSM 19469^ was sequenced 
within the DFG funded project "Ecology, Physiolo- 
gy and Molecular Biology of the Roseobacter 
group: Towards a Systems Biology Understanding 
of a Globally Important Clade of Marine Bacteria". 
The strain was chosen for genome sequencing ac- 



cording the Genomic Encyclopedia of Bacteria and 
Archaea [GEBA) criteria [27,28]. 

Project information can found in the Genomes 
OnLine Database [29]. The Whole Genome Shot- 
gun (WGS) sequence is deposited in GenBank and 
the Integrated Microbial Genomes database (IMG) 
[30]. A summary of the project information is 
shown in Table 2. 



Table 2. Genome 


sequencing project intornicition 






r rupcriy 


1 criii 


MIGS-31 


Finishing quality 


finished 


MIGS-28 


Libraries used 


Two genomic libraries: one lllumina PE library (441 bp insert 
size), one 454 PE library (3 kb insert size) 


MIGS-29 


Sequencing platforms 


lllumina GA llx, lllumina MiSeq 


MIGS-31. 2 


Sequencing coverage 


93 X 


MlGS-30 


Assemblers 


Velvet version 1 .1 .36, New^bler version 2.3, Consed 20.0 


MlGS-32 


Gene calling method 


Prodigal 1 .4 




INSDC ID 


CP004372 




GenBank Date of Release 


pending publication 




GOLD ID 


Gi21384 




NCBI project ID 


189501 




Database: IMG 


2522572126 


MlGS-13 


Source material identifier 


DSM 19309^ 




Project relevance 


Tree of Life, biodiversity 



Growth conditions and DNA isolation 

A culture of strain DSM 19469''' was grown aerobi- 
cally in DSMZ medium 514 [31] at 28°C. Genomic 
DNA was isolated using Jetflex Genomic DNA Puri- 
fication Kit (GENOMED 600100) following the 
standard protocol provided by the manufacturer 
but modified by an incubation time of 60 min, in- 
cubation on ice over night on a shaker, the use of 
additional 50 ^1 proteinase K, and the addition of 
100 |il protein precipitation buffer. DNA is availa- 
ble from the DSMZ through the DNA Network [32]. 

Genome sequencing and assembly 

The genome was sequenced using a combination 
of two libraries [Table 2). lllumina sequencing 
was performed on a GA IIx platform with 150 cy- 
cles. The paired-end Hbrary contained inserts of 
an average of 441 bp in length. The first run deliv- 
ered 2.7 million reads. To increase the sequencing 
depth, a second lllumina run was performed, 
providing another 1.2 million reads. After error 
correction and clipping by fastq-mcf [33] and 
quake [34], the data was assembled using Velvet 
[35]. The first draft assembly from 1,753,098 fil- 



tered reads with an average read length of 89 bp 
resulted in 97 contigs. 

To gain information on the contig arrangement an 
additional 454 run was performed. The paired- 
end jumping library of 3kb insert size was se- 
quenced on a 1/8 lane. Pyrosequencing resulted in 
174,493 reads, with an average read length of 360 
bp, assembled with Newbler [Roche Diagnostics). 
The resulting draft assembly consisted of 22 scaf- 
folds. Both draft assemblies [lllumina and 454 se- 
quences) were fractionated into artificial Sanger 
reads 1,000 bp in length plus 75 bp overlap on 
each site. These artificial reads served as an input 
for the phred/phrap/consed package [36]. In 
combination the assembly resulted in 39 contigs 
organized in four scaffolds. Subsequently, small 
unlocalized contigs were mapped to the scaffolds 
using both minimus2 [37] and NUCmer [38]. By 
manual editing the number of contigs could be re- 
duced to 21, organized in one chromosomal scaf- 
fold. The remaining ordered gaps were closed by 
bridging PGR fragments and primer walking. A to- 
tal of 50 reactions were required to conclude the 
assembly process. The genome was sequenced 
with a 93 X coverage. 
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Genome annotation 

Genes were identified using Prodigal [39] as part 
of the JGI genome annotation pipeline. The pre- 
dicted CDSs were translated and used to search 
the National Center for Biotechnology Information 
[NCBI) nonredundant database, UniProt, TIGR- 
Fam, Pfam, PRIAM, KEGG, COG, and InterPro data- 
bases. Identification of RNA genes was carried out 
by using HMMER S.Orcl [40] (rRNAs) and 
tRNAscan-SE 1.23 [41] (tRNAs). Other non-coding 
genes were predicted using INFERNAL 1.0.2 [42] 
Additional gene prediction analysis and functional 
annotation was performed within the Integrated 
Microbial Genomes - Expert Review [IMG-ER) 



platform [43] CRISPR elements were detected us- 
ing CRT [44] and PILERCR [45]. 

Genome properties 

The genome statistics are provided in Figure 3 and 
Table 3. The genome has a total length of 
3,555,109 bp and a G+C content of 65.7%. Of the 
3,599 genes predicted, 3,540 were identified as 
protein-coding, and 59 as RNAs. The majority of 
the protein-coding genes were assigned a putative 
function [79.6%) while the remaining ones were 
annotated as hypothetical proteins. The distribu- 
tion of genes into COG functional categories is pre- 
sented in Table 4. 




Figure 3. Graphical map of the chromosome. From outside to center: Genes on forward strand (colored by 
COG categories), Genes on reverse strand (colored by COG categories), RNA genes (tRNAs green, rRNAs 
red, other RNAs black), GC content (black), GC skew (purple/olive). 
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Table 3. Genome Statistics 



Attribute 


Value 


/o oi 1 oiai 




Genome size (bp) 


3,555,109 


100. 


.00 


DNA coding region (bp) 


3,196,272 


89, 


.91 


DNA G+C content (bp) 


2,337,214 


65, 


.74 


Number of scaffolds MIGS-9 


1 






Extrachromosomai elements MlGS-10 


0 






Total genes 


3,599 


100, 


.00 


RNA genes 


59 


1, 


.64 


rRNA operons 


2 






tRNA genes 


45 


1, 


.25 


Protein-coding genes 


3,540 


98, 


.36 


Genes with function prediction (proteins) 


2,864 


79, 


.58 


Genes in paralog clusters 


2,870 


79, 


.74 


Genes assigned to COGs 


2,750 


76, 


.41 


Genes assigned Pfam domains 


2,942 


81, 


.74 


Genes with signal peptides 


291 


8, 


.09 


Genes with transmembrane helices 


827 


22, 


.98 


CRISPR repeats 


0 







Table 4. Number of genes associated with the general COG functional categories 
Code value %age Description 



1 
J 


1 f\d 


D ,D 


A 


0 


0.0 


K 


149 


5.0 


L 


133 


4.4 


B 


4 


0.1 


D 


25 


0.8 


Y 


0 


0.0 


V 


32 


1.1 


T 


88 


2.9 


M 


191 


6.4 


N 


35 


1.2 


Z 


1 


0.0 


w 


0 


0.0 


u 


52 


1.7 


o 


124 


4.1 


c 


217 


7.3 


G 


174 


5.8 


E 


338 


11.3 


F 


83 


2.8 


H 


143 


4.8 


1 


134 


3.5 


P 


167 


5.6 


Q 


90 


3.0 


R 


379 


12.7 


S 


272 


9.1 




849 


23.6 
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Genomic insights 

Whole genome sequencing of strain R. elongatum 
DSM 19469T revealed a complete and finished ge- 
nome size of 3,555,109 bp, which seems to be the 
smallest completed genome of representatives of 
the Roseobacter group up to date [46]. The two 
other isolates Loktanella vestfoldensis SKA53 and 
Sulfltobacter sp. EE-36 both reveal a genome 
length shorter than that of strain DSM 19469T, but 
remain still in draft state. Whereas many mem- 
bers of the Roseobacter group contain plasmids 
[47], no extrachromosomal elements could be de- 
tected in strain DSM 19469^. 
The fraction of shared genes between strain R. 
elongatum DSM 19469^ and the neighboring 
strains D. shibae DFL-12 (DSM 16493T) [11,48] 
[Figure 1) and Jannaschia sp. CSSl (which turned 
out to have similar genomic characteristics in the 
course of this study, too), both members of the 



D. 



Phages 

Phages are widely distributed and common in ma- 
rine environments [49-51]. Horizontal gene trans- 
fer of the phage genome and its integration in the 
host genome are known to drive the bacterial di- 



Roseobacter group are shown in a Venn diagram 
[Figure 4). The number of pairwise genes was in- 
ferred from the phylogenetic profiler of the IMG- 
ER platform [43]. Homologous genes were detect- 
ed with an E-value cutoff of lO-s and a minimum 
identity of 30%. 

A total of 2,287 genes are shared by all three ge- 
nomes, corresponding to 54.3% and 53.4% of the 
gene count in D. shibae DSM 16493T [11,48] and 
Jannaschia sp. CCSl, respectively. With only 3.5 
Mbp in length, the genome of R. elongatum DSM 
19469T shares more than 64.6% of genes with the 
other two genomes. A number of 645 genes that 
have no homologs in the other genomes were de- 
tected, including a sensor protein of blue-light us- 
ing FAD (BLUE, roselon_02123) and the Phn gene 
cluster (roselon_02168-79) involved in the uptake 
and degradation of phosphonates. 



versity [51,52]. In the genome sequence of R. 
elongatum DSM 19469T several putative phage- 
associated gene sequences were detected, particu- 
larly organized in gene clusters [e.g., roselon- 
.02355 -023703. 



R. elongatum (3540) 



645 



220 




388 



2287 



1319 



1 



1240 




368 



shibae 
4197) 




f 

Jannaschia sp. 
CSSl (4197) 



Figure 4. Venn diagram (total numbers in parentheses) of R. elongatum DSM 
1 9469\ D. shibae DSM 1 6493^ and Jannaschia sp. CSSl . 
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Quorum sensing 

Quorum sensing (QS) is a cell-to-cell communica- 
tion system, where bacteria interact with each 
other in dependence of their population density. 
Gram-negative bacteria use small signal molecules 
called autoinducers, which are produced, excreted 
through the bacterial membrane and detected by 
conspecific bacteria. Consequently, when the con- 
centration of those membrane-diffusible 
autoinducers reaches a specific threshold value, 
the population responds with an activation of 
gene expression to coordinate a population-wide 
behavior [53-58]. QS was first detected in the ma- 
rine gammaproteobacterium Vibrio fischehi, a 
species often found to live in symbiosis with 
squids or fishes. Here, the autoinducer accumula- 
tion and the activation of certain genes result in 
biolumescence [59,60]. Other examples for QS- 
induced bacterial physiological aspects are biofilm 
formation, exopolysaccharide production and vir- 
ulence [53,61]. Interestingly, many representa- 
tives of the Roseobacter group were shown to en- 
code and/or express gene sequences associated 
with QS [e.g., 62-65]. 

Genome analysis of strain R. elongatum DSM 
19469T revealed the presence of genes putatively 
associated with QS like a N-acyl-L-homoserine lac- 
tone synthetase [Luxl homolog; roselon_01555) 
and a regulator of the LuxR family [roselon_3097). 

Photosynthetic gene cluster 

Light is used as energy source by many bacteria in 
the ocean. An increasing number of representa- 
tives belonging to the Roseobacter group have 



been found to be aerobic anoxygenic 
photoheterotrophs, containing 
bacteriochlorophyll a [Bchl a] [3,4,66-69]. They 
transform light energy into a proton motive force 
[pmf) across the membrane that is used for the 
generation of ATP, which could have an im- 
portance for marine environments and global cy- 
cles [66-68]. Aerobic anoxygenic 
photoheterotrophs represent a significant fraction 
of the microbial population depending on the loca- 
tion [69-73]. It was further shown that aerobic 
anoxygenic photoheterotrophs synthesize Bchl a 
only in the presence of oxygen [66,74] and that 
the photosynthetic pigments of aerobic 
alphaproteobacteria are synthesized under dark 
conditions [75-77], whereas some members of the 
gammaproteobacterial OM60/NOR5 clade also 
synthesize pigments in the light [78]. Further- 
more, Elsen and colleagues reported that genes 
encoding the photosynthetic apparatus and relat- 
ed genes are mainly organized in a large gene 
cluster [79]. 

In the description of strain OCh 323t, the authors 
showed that the absorption spectrum of the mem- 
branes of ultrasonically disrupted cells exhibit a 
significant photosynthetic reaction center absorp- 
tion peak (at 800 nm) and a light-harvesting com- 
plex I absorption peak [at 879 nm) [1]. 

The genome sequence of strain R. elongatum DSM 
19469T encodes a functional photosynthetic gene 
cluster [roselon_01064 - 01096) containing a set 
of bch genes, puf genes, crt genes, hem genes and 
genes for proteins with sensory activity [Figure 5). 



CKD E^F>C^X^Y^Z 



QBA 



Figure 5. Arrangement of the photosynthetic gene cluster. Green, bch genes; red, puf genes; orange, crt genes; blue, 
hem genes; purple, genes for sensor proteins, white, other genes (adapted after [77,80]). 



Motility and flagellar genes 

Strain R. elongatum DSM 19469T was originally 
described as non-flagellated [1]. In the genome a 
flagella gene cluster was found flanking the chro- 
mosome-partitioning gene dnaA [roselon_1273). 
Flagella formation depends on external stimuli 
such as incubation temperature or composition of 
the media [81]. Thus, strain DSM 19469^ might 



exhibit a motile phenotype under certain, as yet 
unknown, conditions. Flagellar genes of strain 
DSM 19469T involved in flagellar assembly and 
function were analyzed to assess potential motili- 
ty behavior. The cluster consists of 28 genes 
[roselon_01279 - 01316). Three further motor 
switch proteins, including fUG were detected up- 
stream of roselon_03222. Together with fUM 
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(roselon_03295] and fliN (roselon_01309] fliG 
forms a protein that controls rotation behavior of 
flagella. This dissociation of flagellar operons has 
been seen in two groups of alphaproteobacteria 
[82]. No master regulator genes operon {flhDC) 
[83] could be detected. Whereas genes controlling 
the early flagellum assembly were not detected, 
several proteins necessary for the formation of the 
basal body were found, including //^DEFGHIKL 
and fliF. Genome analysis of strain DSM 19469''' 
revealed further the presence of genes involved in 
the formation of the export apparatus: the previ- 
ously mentioned C-ring forming complex ///GMN 
and the protein-encoding sequences flhk,flhB,fli?, 
/7/Q and /7/R, which are involved in pore-forming 
through the membrane [84]. Whereas two motor 
protein-encoding gene sequences motAB were 
found [roselon_01316, roselon_01313], a homolog 
of the fliO gene as part of the channel-forming ap- 
paratus was absent. Additionally, the genome of 
strain R. elongatum DSM 19469''' revealed the 
presence of regulatory genes controlling the late 
phase, such as the hook capping protein 
(roselon_01279), the flagellar hook-length control 
protein (roselon_01280) and the flagellin- 
encoding gene sequence fliC [roselon_01284). Me- 
thyl-accepting chemotaxis proteins that sense ex- 
ternal stimuli, and therefore direct flagella- 
induced motility of strain DSM 19469''', could not 
be detected. 

To compare the flagellar gene clusters of neigh- 
boring species (Figure 6), homologs of flgG coding 
for a protein mainly involved in the formation of 
the basal body in R. sphaeroides ATCC 17029 [85] 
were identified using the IMG/ER platform [43]. 
All compared genomes show a similar gene cluster 
structure, but have variations such as differences 
in gene length for fliK, which controls the comple- 
tion of previous flagellum-assembly steps. The fliK 
protein in R. sphaeroides is 700 amino-acid resi- 
dues [AA) in length [85]. A genome BLAST search 
(minimal similarity 30%, maximal e- value 10"^) 
against putative fliK proteins revealed that the 
gene-encoding sequence length of fliK varies from 
102 AA in R. sphaeroides strains WS8N and 2.4.1 
to 937 AA in Citreicella sp. SE45. The genomes of 
the three species Salipiger mucosas, SagittuUa 
stellata and Pelagibaca bermudensis each encode a 
truncated /7/K-encoding gene sequence, but those 
strains do not form flagella [86-88]. These trunca- 
tions could be the reason for inactive proteins re- 
sulting in a non-motile phenotype. In contrast, the 
genome of Jannaschia sp. CCSl codes for a fliK 



protein of 612 AA (Jann_4206) and, interestingly, 
this strain was reported to be motile. 

The second marked region (Figure 6) is well con- 
served in the first four genomes, but is missing in 
strain R. sphaeroides ATCC 17029. This cluster 
consists of the rod-forming gene fig] and three 
proteins involved in the regulation of the flagella 
assembly. Homologs of the R. elongatum DSM 
19469T flagellin gene (roselon_01284) are absent 
in R. sphaeroides. Thus, the regulation of the flagel- 
la operon might be conducted by other genes: one 
of the genes coding for the flagellin-forming FliC in 
R. sphaeroides is located on the chromosome with- 
in the flagellar cluster. An additional set of three 
regulation genes is detected on the 120 kb plas- 
mid (NC_009040) of the genome. In area 3 of Fig- 
ure 6 the genomes of both S. stellata and 
Citreicella sp. lack three flagellar genes: fliL and 
fli¥, which are both involved in the formation of 
the basal body, and fli? (export apparatus). An ad- 
ditional PAS/PAG sensor hybrid histidine kinase 
(Rsphl7029_2967) is found in the R. sphaeroides 
genome. 

Morphological traits 

The genome sequence of strain R. elongatum DSM 
19469^ was found to have specific genes associat- 
ed with the putative biosjmthesis and export of 
exopolysaccharides (roselon_01150, 
roselon_01343 - 01343) and the putative export of 
capsule polysaccharides (e.g., roselon_00513, 
roselon_01783 - 01785). 

Additionally, the genome of strain R. elongatum 
DSM 19469''' encodes several gene sequences as- 
sociated with flp-tjT)e pili biogenesis and for- 
mation (e.g., roselon_01843 - 01852). Hence, the 
formed pili might play a role in adhesion or 
switching-type motility on solid surfaces. 

Further, strain R. elongatum DSM 19469''' seems to 
accumulate polyhydroxyalkanoates as storage 
compounds (e.g., roselon_00211 - 00214). 

Metabolic plasticity 

The genome sequence of strain R. elongatum DSM 
19469''' encodes a gene cluster associated with a 
Sox multienzyme complex (roselon_02191 - 
02202) that could be utilized for the oxidation of 
thiosulfate to sulfate. Carbon monoxide could be 
putatively oxidized by aerobic-type carbon mon- 
oxide dehydrogenases (roselon_01738, 
roselon_01976 - 01977, roselon_02472, 
roselon_02474). 
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Several genes play a role in the electron transport 
chain, such as those associated with the NADH de- 
hydrogenase (e.g., roselon_00011 - 00023), suc- 
cinate dehydrogenase (roselon_01681 - 01684) 
and cytochrome bd ubiquinol oxidase 



(roselon_00027 - 00028). In addition two different 
cytochrome c oxidases [caos-type [e.g. 
roselon_02733 - 02734] or cbb^-type 
[roselon_00626 - 00628]) were detected. 
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Figure 6. Map of the flagella cluster of R. elongatum DSM 19469^ (roselon_Rosei_p5_w02) and homologous 
ORFs In the genomes of the four comparable strains Jannaschia sp. CCSl (NC_007802), S. stellata E-37 
(NZ_AAYA01 000005), R. sphaemides ATCC 17029 (NC_009049) and Citreicella sp. E45 (NZ_GG704601 ). 
Prediction of homologs was conducted using the conserved-neighborhood tool of the IMG-ER platform [43]. 
The colored areas represent differences in the genomic structure within the flagella cluster. 
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